Install packages we will need.

# install.packages("ggplot2")
# install.packages("sp")
# install.packages("raster")
# install.packages("rgdal")
# install.packages("brew")
# install.packages("systemfonts")
# install.packages("mapview")
# install.packages("spatstat")
# install.packages("tmaptools")
# install.packages("elevatr")
# install.packages("rgbif")
# install.packages("remotes")
# remotes::install_github("Nowosad/spDataLarge")
# install.packages("spData")
# install.packages("gstat")
# install.packages("fields")
# install.packages("foreign")
# install.packages("velox")

What is a geographic information system (GIS)? While often thought of as means to making maps, the more basic utility of a GIS is the ability to overlay diverse spatial data layers. This is something that flat maps didn’t allow us to do, and is best gotten across with an example. Let’s start with tabular data on the point locations of detention basins (boring data that provides a great example!). In the process of working with this data you will be introduced to several topics (coordinate systems, vector and raster data) that we treat in more depth later.

Let’s first consider coordinates. Web services like Google Maps have locations across much of the world mapped to lat-long coordinates. When these locations are polygons (e.g. Athens-Clarke County), Google Maps has an associated centroid (the approximate geographic center of the polygon) and bounding box. So we can actually retrieve the coordinates of named locations by geocoding, basically an API call. Since Google now charges, we’ll use OpenStreetMap instead :)

library(tmaptools)
acc<-geocode_OSM("Athens-Clarke County")
acc$coords
##         x         y 
## -83.35578  33.94587
acc$bbox
##      xmin      ymin      xmax      ymax 
## -83.53747  33.84802 -83.24083  34.03947

I have a bunch of herbarium data on the occurrences of rare plants associated with granite outcrops in the Southeast.

Unfortunately, until recently, botanists collecting these specimens did not report coordinates. Nevertheless, many times location names can be used to generate coordinates. Many names are not unique, but usually are if combined with county and state.

Let’s see if we can get correct coordinates for Echols Mill (now quarried) nearby in Oglethorpe county using just the place name.

em1<-geocode_OSM("Echols Mill")
em2<-geocode_OSM("Echols Mill, Oglethorpe, Georgia")

When we don’t supply more detail, we get the centroid of Echols county in south Georgia on the Florida line. But when we do, we get the Echols Mill I had in mind.

Note: You can tell by the difference in coordinates. Or use JPS suggestion below:

If you’re geocoding, you will definitely want to doublecheck your coordinates by overlaying them on a map of the region of interest to make sure the API is making the right associations! Below, we’ll see how to overlay points on polygons to do a spatial join to collect the info you would need to check against for accuracy.

To illustrate, let’s bring in a .csv that contains the point locations of detention basins in Sandy Springs, which is near Atlanta in Fulton county (http://www.fultoncountyga.gov/fcgis-geospatial-data).

# Bring in Sandy Springs detention basin points from Fulton County GIS website 
dbasins<-read.csv("Stormwater_Basins__Ponds__Hosted___Sandy_Springs_Georgia.csv")

# Take a look at the data
head(dbasins)
##           X        Y OBJECTID BasinType BasinSurf DateCreated ConstructType
## 1 -84.35676 33.88007     2801  Wet Pond   Surface        1996         Riser
## 2 -84.35658 33.88119     2802  Dry Pond   Surface        2007     Weir Wall
## 3 -84.35635 33.88207     2803  Dry Pond   Surface        1989     Weir Wall
## 4 -84.35419 33.88468     2804  Dry Pond   Surface        2001         Riser
## 5 -84.35284 33.88555     2805  Dry Pond   Surface        1989     Weir Wall
## 6 -84.35674 33.88303     2806  Wet Pond   Surface        1999         Riser
##   DamHeight
## 1         8
## 2         5
## 3         6
## 4        10
## 5         3
## 6         5

Notice that there is an X and Y column giving the longitude and latitude of the location of each pond. But also a series of attributes for each pond, both categorical and continuous.

# What are the categories?
unique(dbasins$BasinType)
## [1] Wet Pond  Dry Pond  Other BMP
## Levels: Dry Pond Other BMP Wet Pond
unique(dbasins$BasinSurf)
## [1] Surface           Underground Vault Parking Lot       N/A              
## Levels: N/A Parking Lot Surface Underground Vault
unique(dbasins$ConstructType)
## [1] Riser          Weir Wall      Stand Pipe     Concrete Flume Earthen Flume 
## [6] N/A           
## Levels: Concrete Flume Earthen Flume N/A Riser Stand Pipe Weir Wall

We can explore these attributes without considering space. How many ponds by type, whether surface or underground, and construction type?

# Aggregate data by categories
dbtypes<-with(dbasins,table(BasinType,BasinSurf,ConstructType))
dbtypes<-as.data.frame(dbtypes)
head(dbtypes)
##   BasinType   BasinSurf  ConstructType Freq
## 1  Dry Pond         N/A Concrete Flume    0
## 2 Other BMP         N/A Concrete Flume    0
## 3  Wet Pond         N/A Concrete Flume    0
## 4  Dry Pond Parking Lot Concrete Flume    0
## 5 Other BMP Parking Lot Concrete Flume    0
## 6  Wet Pond Parking Lot Concrete Flume    0

We can also graph non-spatial summaries of the attribute data. First by categories.

# Load graphics package
library(ggplot2)

# Make barplot of frequencies across categories
ggplot(data=dbtypes, aes(y=Freq,x=BasinType,fill=ConstructType)) + 
  geom_bar( stat="identity") + 
  facet_grid(~BasinSurf) + 
  labs(y = "Count") + 
  labs(x = "") + 
  theme(legend.position="bottom")

Now by categorical and continuous fields.

# Get age
dbasins$Age<-2019-dbasins$DateCreated

# Show age range by construction type 
ggplot(data=dbasins, aes(y=Age,x=ConstructType,fill=ConstructType)) + 
  geom_violin() + coord_flip() +
  theme(legend.position="") +
  labs(x = "")
## Warning: Removed 30 rows containing non-finite values (stat_ydensity).

To make use of the spatial component of the data in R, we need to convert the table to a spatial points dataframe (SPDF). I discuss data formats in more detail in the zoom recording.

# load sp package
library(sp)

# First let's look at arguments for the command SpatialPointsDataFrame
?SpatialPointsDataFrame

To convert to SPDF, we need the CORRECT descripton of the coordinate system.

# In this case the coordinates are lat-long so the value of the argument is
# CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
dbasin_pts<-SpatialPointsDataFrame(dbasins[,1:2],
              dbasins, 
              coords.nrs = numeric(0), 
              proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

Now let’s take a quick look to make sure the points seem okay. The plot shows basically the shape of north Fulton county which is where the points should be.

plot(dbasin_pts)

Let’s look at the description of the SPDF. Notice that we now have points that can be displayed in 2D space that also have attributes linked to them.

dbasin_pts
## class       : SpatialPointsDataFrame 
## features    : 1261 
## extent      : -84.44169, -84.25951, 33.87834, 34.00883  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 9
## names       :            X,           Y, OBJECTID, BasinType,         BasinSurf, DateCreated,  ConstructType, DamHeight, Age 
## min values  : -84.44169472, 33.87833621,     2801,  Dry Pond,               N/A,        1950, Concrete Flume,         0,   1 
## max values  : -84.25950623, 34.00882877,    11812,  Wet Pond, Underground Vault,        2018,      Weir Wall,       999,  69

Similar to the geocoding example above, with these spatial points, we can make API calls through R packages to acquire, for example, elevation data from the USGS Elevation Point Query Service.

library(elevatr)
dbasins_subset<-dbasin_pts[1:10,]
dbasins_subset<-get_elev_point(dbasins_subset)
## Note: Elevation units are in &units=Meters 
## Note:. The coordinate reference system is:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
dbasins_subset
## class       : SpatialPointsDataFrame 
## features    : 10 
## extent      : -84.35676, -84.34985, 33.88007, 33.88555  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 11
## names       :            X,           Y, OBJECTID, BasinType,         BasinSurf, DateCreated, ConstructType, DamHeight, Age, elevation, elev_units 
## min values  : -84.35675673, 33.88007369,     2801,  Dry Pond,           Surface,        1989,         Riser,         3,  12,     255.8,     meters 
## max values  : -84.34984609, 33.88555104,     2810,  Wet Pond, Underground Vault,        2007,     Weir Wall,        10,  30,    283.14,     meters

Now let’s bring in another layer. A shapefile of subwatershed polygons from USGS (U.S. Geological Survey, National Geospatial Program, 20200505, USGS National Hydrography Dataset Best Resolution (NHD) for Hydrologic Unit (HU) 8 - 03130001 (published 20200505): U.S. Geological Survey.). I downloaded this data using the National Map viewer website (https://viewer.nationalmap.gov/advanced-viewer/).

Note that the shapefile has become the standard format in which vector GIS data is usually served. Although developed originally by ESRI, it has become non-proprietary. IMPORTANT to note that a single shapefile is really a set of linked files (.shp, .dbf, .shx, .prj) that are all required, but are all called by using “filename.shp”.

# Load raster package
library(raster)

# Bring in the 12 digit hydrologic units (HUC)
hucs12<-shapefile("Hydrologic_Units_Subwatershed.shp")

What is the coordinate system of the hucs? Is it also lat-long (aka unprojected, aka geographic)? How can we find out?

hucs12
## class       : SpatialPolygonsDataFrame 
## features    : 759 
## extent      : -85.83745, -82.91979, 32.33565, 34.87591  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 21
## names       : OBJECTID,    HUC_8,     HUC_10,       HUC_12, ACRES, NCONTRB_A, HU_10_GNIS, HU_12_GNIS,   HU_10_DS,         HU_10_NAME, HU_10_MOD, HU_10_TYPE,     HU_12_DS,                       HU_12_NAME, HU_12_MOD, ... 
## min values  :        1, 03070101, 0307010101, 030701010101, 10055,         0,         NA,         NA, 0307010103,    Amicalola Creek,        DM,          S, 030701010102,  Acorn Creek-Chattahoochee River,        DM, ... 
## max values  :       99, 03150108, 0315010810, 031501081006,  9963,         0,         NA,         NA, 0315010901, Yellowjacket Creek,        UA,          S, 031600020808, Zachry Creek-Chattahoochee River,        UA, ...

So, we see that hucs12 includes long/lat data (as described by the crs value).

Now let’s overlay these layers to browse the data in R. You can zoom in or out and clip on points or polygons to see the attributes.

# Load mapview package to interactively view layers
library(mapview)

# View
mapview(dbasin_pts) + mapview(hucs12)
## Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
## Need '+proj=longlat +datum=WGS84'

## Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
## Need '+proj=longlat +datum=WGS84'

Let’s add the name of the 12 digit subwatershed to the attributes of the detention basin points to allow us to look at differences between subwatersheds. This is known as a spatial join.

# Get HUC 12 watershed name of each retention basin
dbn_hucs<-over(dbasin_pts,hucs12)
head(dbn_hucs)
##   OBJECTID    HUC_8     HUC_10       HUC_12 ACRES NCONTRB_A HU_10_GNIS
## 1      469 03130001 0313000112 031300011203 24078         0       <NA>
## 2      469 03130001 0313000112 031300011203 24078         0       <NA>
## 3      469 03130001 0313000112 031300011203 24078         0       <NA>
## 4      469 03130001 0313000112 031300011203 24078         0       <NA>
## 5      469 03130001 0313000112 031300011203 24078         0       <NA>
## 6      469 03130001 0313000112 031300011203 24078         0       <NA>
##   HU_12_GNIS   HU_10_DS      HU_10_NAME HU_10_MOD HU_10_TYPE     HU_12_DS
## 1       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 2       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 3       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 4       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 5       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 6       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
##    HU_12_NAME HU_12_MOD HU_12_TYPE META_ID STATES
## 1 Nancy Creek        UA          S    GA04     GA
## 2 Nancy Creek        UA          S    GA04     GA
## 3 Nancy Creek        UA          S    GA04     GA
## 4 Nancy Creek        UA          S    GA04     GA
## 5 Nancy Creek        UA          S    GA04     GA
## 6 Nancy Creek        UA          S    GA04     GA
##                                 GlobalID ShapeSTAre ShapeSTLen
## 1 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 2 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 3 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 4 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 5 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 6 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
# Add HUC 12 name to detention basin SPDF
dbasin_pts@data<-cbind(dbasin_pts@data,dbn_hucs$HU_12_NAME)

# Let's look at the dbasin.pts attribute table now
dbasin_pts
## class       : SpatialPointsDataFrame 
## features    : 1261 
## extent      : -84.44169, -84.25951, 33.87834, 34.00883  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 10
## names       :            X,           Y, OBJECTID, BasinType,         BasinSurf, DateCreated,  ConstructType, DamHeight, Age,               dbn_hucs$HU_12_NAME 
## min values  : -84.44169472, 33.87833621,     2801,  Dry Pond,               N/A,        1950, Concrete Flume,         0,   1, Crooked Creek-Chattahoochee River 
## max values  : -84.25950623, 34.00882877,    11812,  Wet Pond, Underground Vault,        2018,      Weir Wall,       999,  69,                       Nancy Creek

Now, we can write out dbasin_pts as a shapefile that can be brought in to other GIS software and will retain the new information we have collected.

library(raster)
shapefile(dbasin_pts,"dbasin_pts.shp", overwrite=T)
## Warning in rgdal::writeOGR(x, filename, layer, driver = "ESRI Shapefile", :
## Field names abbreviated for ESRI Shapefile driver

Or, the dbasin_pts spatial points data frame can be stored within a saved R workspace or .RData file. Please save this workspace so that you can load it in the next exercise.

save.image("intro.RData")

We have just worked through a simple example of overlaying spatial data layers in order to transfer data from one layer to another.